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Abstract 



A feature of multinomial models with unknown index N is that the dimension of 



the parameter space potentially depends on N, a complication when fitting models by 
Markov chain Monte Carlo (MCMC). Two commonly used approaches to this problem 
are: (i) trans-dimensional reversible jump MCMC and (ii) superpopulation data aug- 
mentation. A distinguishing feature of the two approaches is that N, and combinatorial 



terms involving N, are not explicit in the superpopulation likelihood. To resolve ambi- 
guity about the relationship between the two approaches we compare them analytically. 
We show that superpopulation data augmentation is equivalent to trans-dimensional 
sampling but with a restricted prior on N. We highlight potential drawbacks that 
result from not making N explicit in the likelihood in the superpopulation approach. 
One advantage of the superpopulation approach has been the availability of easy to 
use BUGS code. We provide simple BUGS code that implements trans-dimensional 
reversible jump MCMC for the mark-recapture model that can be readily extended 
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to related models. 

Key Words: BUGS/ J AGS; Capture-recapture; Data augmentation; Heterogeneity; 
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1 Introduction 



Inference about N, the unknown size of a population, based on capture-recapture 
models is of enormous interest, with ecology and epidemiology just two areas in which 
there are many applications. This is a specific case of a more general class of problems 
in which the index of a multinomial model is unknown. A sub-class of models of 



particular interest are t 
among individuals. | Link 



r e het erogeneity models in which capture probabilities vary 



(|2003l ) has shown that inference about N within this class of 
models is sensitive to the choice of model for describing individual variation. 

A drawback of much available software for fitting is that analyses are restricted 
to default choices of model for the individual capture probabilities. Restricting analyses 
based on available computer code seems undesirable and an advantage of Bayesian ap- 
proaches is that hierarchical modelling and data augmentation implemented via Markov 
chain Monte Carlo (MCMC) provide a natural framework for general model fitting. 
A comm only used dataset for illustrating model is the snowshoe hare data 



provided by 



Otis et al 



(|1978l ) in which 68 individuals were captured in six days of 
trapping with encounter frequencies / = (25, 22, 13, 5, 1, 2)' where fj(i = l,..., k) is 
the number of hares caught i times during the k = 6 study. Define iV as the population 
size, yi as the number of times that individual i was caught and Pi as the correspqndin 



capture p robability. 



or JAGS ( Plummer 



To m odel these using a program such as BUGS (jLunn et al 



2000) 



2003) we would like to be able to specify the model simply as 



yi ~ Bin(k,pi 



for i in 1 : N (herein, unless we specify the two programs separately, we will use 
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BUGS to refer to BUGS and JAGS jointly). To complete the model specification we 
would then specify a suitable distribution for p^, hyperpriors for any parameters of this 
distribution and a prior for N. However two related things prevent us from doing this: 
(1) BUGS does not allow the index of the loop to be stochastic and (2) the dimension 
of p changes ea c h tim e we update N to a new value. 



Rovle et al 



(|2007l ) provide a neat solution that is easily implemented in BUGS 
based on a superpopulation. However, a drawback of their model and code is that 
./V does not appear in the likelihood; rather it is a derived parameter. This makes 
specification of priors on N more of a challenge than if N appears explicitly in the 
likelihood. 

Here w e contrast approac hes to fitting M/, using the models of 
(|2007l ) when using Bayesian met h ods. T 



(2008 ) and 



Rovl e et al 



20081 ) and 



Cing and Brooks! (120081'). as we ll as 



King and Brooks 



Wright et al 



Durban and Elstonl (|2005l ) 



le approach taken by 



Schofield and Barker 



(TD) algorithm (ICarlin and C 



(RJM C MC) algorithm 



tog), 



(2009) f or similar models , is to consider a trans-dimensional 
]]] 



Green 



Link and Barkerl ( 2010) 



(DA) approach 



1995 



Green 



1995) . In contrast, 



1995) such as t 



l e reve rsible jum p MCMC 



Rov 



Schofie 



Tanner and Wong 



e et a 



(120071 ) as well as 



Rovle and Dorazio 



d and Barkerl (|2010l ) use a data augmentation 



19871 ) to fit the model. 



Our purpose is to: 

1. Clarify the distinction between the different likelihoods used to fit models by 
MCMC. 

2. Provide simple BUGS code for Mh that explicitly includes N as a parameter. 

3. Consider the comment made by 



Rovle et al 



(|2007l ) that their approach is not a 



case of trans-dimensional MCMC and fundamentally differs from approaches that 
use RJMCMC. 
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2 M h Likelihoods 



Like 



1978 



i hoods for mode l Mh include an inte 



Otis et al. 



grated 



19781 ). a "full " likelihood (IPledger 



a super-population likelihood (jRovle et al 



ikelihood 



2000 



Burnham and Overton 



King and Brooks 



2008), or 



20071 ). The three most obvious differences 



between these approaches are (i) whether or not N is explicitly included in the like- 
lihood, (ii) the presence or absence of combinatorial terms involving N, and (iii) the 
presence or abse nce of a superpopu lation (with corresponding value M) in the model. 



The likelihood of lRovle et al 



(|2007l ) does not include N, or co mbinato r ial ter ms involv- 



ing N, but does include M. The full likelihood approach of 



Pledged (|2000j ) explicitly 



includes N and associated combinatorial terms, does not include M and treats the vec- 



Burnham and Overton 



jiaZa ) explicitly 



tor of capture probabilities p as parameters, 
includes N, does not include M and models p as random effects; combinatorial terms 
appear once the capture probabilities are integrated out of the likelihood. 



Table 1 about here 



2.1 Burnham's Integrated Likelihood 



A complete data model can be written as Pr(X = x\P = p); as shorthand we use [x\p\. 



Both x and p are unobserved, althoug 
observed data matrix x obs . Following 
model as 



i we know values of some row s of x through the 



Burnham and Overton! (|1978l ) we can write the 



N k 



[x\p\ 



nrift 

i=ij=i 



(i) 



under the assumption that, conditional on P = p, capture events are independent 
among individuals and occasions. Conditional on N, which is treated as a parameter, 

the number of rows of x are now know n. 

As noted bv lBurnham and Overton! (| 19781 ) this model is over-parameterized and not 



useful for estimation. The usual remedy for over-parameterization is to model P as 
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random effects sampled from a distribution with pdf fp(p\O p ) for P £ ^|q ; 1 ]- We then 
i ntegr ate over p to obtain the integrated li kelihood given by iBurnham and Overton 



(j 19781 ); this is the observed data likelihood (jGelman et al 



20041 ). 



2.2 Complete Data Likelihood (CDL) 

The model ([I]) is underspecified. Adding the random effects distribution for P we can 
write a CDL as 



N k 



[x\ P ]\p\e p ] = [ P \e p ] xnn^(i-K) 1 "- 

i=lj=l 



(2) 



As before, both x and p are unobserved. Associated with x obs is a subspace of S, 
which we denote by Q(x), of capture history matrices that are identical in appearance 
to x obs . The only differences among elements of Q(x) are that the row indices are 
permuted. 

Under the assumption that Xi\pi and Xj\pj are independent for all i ^ j, it follows 
that the x's are exchangeable given the p's and that the elements in £l(x) have identical 
joint probabilities across their rows. A heuristic explanation is that once we have drawn 
(xi,pi) pairs from their joint distribution they are linked - when we permute the x's 
the p's get permuted with them. 

It now follows that 



[N, P \x° bs ,e p ] 



cx [x obs \p}[p\9 p ] 



N 



n - , *nN<yIK ij '(i-^) 

LLheH z h- i=1 j=1 



l—Xi 



is also a likelihood where t-, N ' — r is the dimension of Q(x). The likelihood ([3]) is not the 
observed data likelihood (ODL) as it still contains unob served elements. If we integrat e 



(3) 



Burnham and Overtonl (j 19781 ). 



across the random effects we obtain the ODL given by 
However, the likelihood ([3]) is useful in Bayesian inference carried out by MCMC in that 
explicit modelling in terms of latent components can be a useful device for improving 
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mixing. 



2.3 Super Population CDL 



Rovle et al 



(|2007l ) describe a CDL in terms of a super-population M (essentially an 
upper bound on the population size N), with X now an M by k matrix. The idea is 
that we replace N in the likelihood with the outcomes w of the random vector W of 
dimension M where W{ takes the value 1 if individual i is included in the population 
and otherwise such that N = J2i=i By assumption , W and P are in dependent 



given their parameters. The complete model on which the 
is based can be written as 



Rovle et al 



(|2007l ) approach 



[x,w,p\a p ,ip] = [w\ip]\p\w,a p )[x\w,p] 
= [w\tp]\p\a p ][x\w,p] 

M t 

= n n^^^-^/pwrp - i>) 

i=l j=l 
M 

= H(p i w i r(l- Pi w i ) t -^fp( Pi )i; w '(l - ^f~ w \ 



1 — Wi 



(4) 



The sample space S for x is now of dimension 2 Mt and the sample space W for w of 
dimension 2 M . The sample space for p is given by V = K^f^- 

Denoting the cdf for p by F{p) = Fg (p) and integrating over p and w, we obtain 



[x obs ,N\a p ,i>,M} 



AH 



Y[ h ^z h \{N-n] 



M—N 



where 9 jF = J V(1 -pY~ j dF(p). That is, we have Burnham's marginal likelihood 
multiplied by the prior for iV induced by the model W% ~ Bern(tp). 

Inferenc e in the two approa ches will be identical for the appropriate choice of prior 



on if) in the 



Rovle et al 



(2007) likelihood. For example, 



/(V>) oc 1 



6 



Rovle et al 



(2007). Another 



corresponds to a discrete uniform prior on N as shown by 
choice could be: 

V 

which leads to N oc A but with the upper bound M. This is Jeffrey's prior for N 
but truncated to the range [0, M\. In general, the prior on N induced in the super- 
population approach is the marginal distribution for N arising from a binomial mixture, 
with mixing distribution given by the prior on tp. While it is relatively easy to construct 
reference priors of the form given above, constructing general priors on N that are not 
binomial mixtures will be more difficult. We do not wish to overstate this problem as 
the binomial mixtures provide a flexible class of priors including the beta-binomial and 
the negative-binomial as a limiting case. 



3 Gibbs Sampling and Model Mh 



Durban and Elstonl (|2005l ) 



King and Brooks! (|2008l ) and 



Schofield and Barker! (|2008h 



all use d a trans-dimensiona l (RJMCMC) approach to fit Mh using the CDL described 



above. 



Wright et al. 



( 20091 ) used a similar approach in an application where individual 



identities were uncertain. In this approach, sampling fro m the 
includes an explicit reversible-jump step in the sense of iGreen 



:"ull conditional for N 
n contrast, 



(U995|) 



the re versible-jump step is unecessary in the super-population approach of 



Rovle et al 



(12007ft . 

To describe what we call the "standard approach" to Gibbs sampling we start with 
the complete data likelihood ([2|). Although we describe model fitting from a Bayesian 
stand-point, we could implement a Frequentist approach using an EM- algorithm in- 
stead of Gibbs sampling. 

For simplicity we assume 6 P is known. A Gibbs sampler can be constructed by 
alternating sampling from the following distributions, 
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1. The full conditional distribution for p^, Vi: 



[ Pi \x,N,6 p ] ( xpT ] (l-p l ) 1 - x -f(p\9 p ). 

Depending on [p|0 p ], we can either sample from this distribution directly, or use 
another a simulation based sampler, such as the Metropolis-Hastings algorithm 
or rejection sampling. 

2. The full conditional distribution for N: 

/VI N k 

M*'*] K TW^nY nn^a - Vi?- Xi3 f(N). 
K >- i=lj=l 

This update is more difficult sin ce the value of N changes th e dimension of p, so we 



use a TD algorithm. Following ISchofield and Barker! ( 



20081 ) we use a special case 



of the RJMCMC algorithm. We first propose a new value N* from J(N*\N) and 
specify p* to be a vector of length N* corresponding to the capture probabilities 
for the N* individuals. We set p\ = pi for all i = 1, ... ,N and generate p* for 
i = N + 1, . . . , ./V* from the prior distribution Generating p* in such a way 

simplifies the RJMCMC algorithm. The acceptance probability is 

_ N*\(N-n)\ yr n _ ^ k J(N\N*) f(N*) 
Q (N* - n)\N\ .JLi x { Vl) J(N*\N) f(N)' 

If our proposed value N* < N we obtain p* by removing the last iV — N* rows 
of p. The acceptance probability becomes 

= N*\(N-n)\ 1 J(N\N*)f{N*) 

q (N* - n)\N\ n£^ +1 (l - ft)* J{N*\N) f(N) [ ~ h 

where /(■) is an indicator function. Importantly, we note that the full RJMCMC 
expression has been reduced to a standard Metropolis-Hastings like expression 
that consists of a likelihood ratio x jumping distribution ratio x prior ratio. 

If we also wish to estimate p , we simply need to (i) specify a prior distribution for 
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dp, and (ii) sample from the full conditional distribution for 6 p in turn. Although we 
have never found it necessary, more complicated RJMCMC setups could be specified 
in order to speed up th e RJMCMC algorithm . For more extensive details of RJMCMC 
we direct the reader to 



Gelman et al. 



(2004). 



A strength of this approach is the ease with which we can incorporate extensions 
to the model. For example, if the tags are themselves uncertain we need to make only 



one adjustment to the algorithm above: we add a step where we sample f rom the full 



Wright et al 



20091 for details). 



conditional distribution for all of the uncertain tags (see 
We do not need to alter the steps we already have because conditional on x the terms 
y/J are fixed and cancel out of the above distributions. With N explicit in the model 
as a parameter it is also easy to include different priors for N, including hierarchical 
priors, in the usual way. For example, if mark-recapture is carried out across distinct 
subpopulations it would be straightforward to model N for each population in terms 
of an appropriate spatial model. 



One perceived weakness of this TD approach is that it is difficu 



into BUGS, in contrast to the super population approach (jRovle et al 



t to incorporate 



20071 ). However, 



this is not the case, so lon g as we specify a prior for N that has an upper bound M 



( Durban and Elston 



20051 ). We provide BUGS code for model M h (Figu re dD that is 



j ust a s simple and requires similar programming effort to the code given by 



Rovle et al 



(|2007l ). A full description of the model including the data and initial values files 
are available for both BUGS and JAGS at www.maramatanga.com, We also provide 
alternative model/data files for model that allow for easier specification of the data, 
albeit with a longer model statement. 

Note that specifying an upper limit is required on l y for easy implementation in 



BUGS; there is no superpopulation in the 



Rovle et al 



(|2007l ) sense. We can achieve 



an approximation to any distributions for N with support on the non-negative integers 
by including a large M and truncating (JAGS) or censoring (BUGS) the distribution 
at M (although very large values for M will slow model fitting). We find that both 
(i) the time taken to run the BUGS code and (ii) the mixing of the MCMC output 
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is approximately equivalent to the corresponding superpopulation code. However, we 
note that many things can substantially influence both the time taken for the code to 
run and the mixing performance. These include (i) how we parameterize the model on 
N, (ii) the sampling algorithms the programs choose and (iii) which software package 
and version we use. We find that in general for models of this type, JAGS takes longer 
to run than BUGS but mixes better. 

We illustrate the code in Figure Q] with model Mh using the snowshoe hare data. 
The model was fitted in JAGS 2.1. and density plot s of N (Figure[2|) can be compared 



to those in 



Rovle et al. 



(|2007h and 



Link and Barkerl ()2010h . 



Figure 1 about here 



Figure 2 about here 



3.1 Super Population approach and RJMCMC 

The starting point for the super population model is the complete data likelihood 
specified in 0. To complete the specification of t he model we specif y a prior for -ip, 



f(ifj), as well as a random effects distribution for p. I Rovle et al 



(l2007h provide BUGS 



code for fitting this model. The CDL that their BUGS code provides, referred to as a 
data augme n tation (DA) algorithm, is identical to that in @ with the exception that 



Rovle et al 



()2007l ) fix the first n values of Wi at 1. Data augmentation now occurs 
over a reduced subspace for W; that in which W\ = W2 = . . . = W n = 1. However, the 
effect on the reduced CDL is simply the omission of the ri/i 2 ^' terms. Therefore, the 
likelihood is equivalent, although as we discuss below a consequence of conditioning 
on this one order W is that the observed z^s cannot be modelled in the presence of 
tag -reading unc e rtaint y. 



Rovle et al 



(|2007l ) comment that the DA algorithm is an estimation approach 
which fundamentally differs from other TD algorithms which are "model-selection ap- 
proaches" since the use of a TD algorithm treats each value A as a separate model. 
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Here we argue that this difference is semantic, and that in an algorithmic sense the two 
approaches are equivalent since they can be set up to give the same Gibbs sampler. 
To see the e quiva lence, we start with a superpopulation model similar to the that 



of 



Rovle et al 



(|2007l ). The only difference is that we treat the capture probability as 
undefined for any individual that is not in the population, that is, pi is undefined when 
Wi = 0. It is clear that this model is TD, in the sense that changing the value of 
Wi changes the dimension of the capture probability vector p whose length is defined 
by N = Yli w i- This setup has 2 M ~ n possible "models" (by possible models we refer 
to models with non-zero probability). That is, N is censored by n and has an upper 
limit of M so that there are M — n pseudo- individuals who were unobserved. Each of 
these 2 M ~ n models describes the different ways that the M — n unobserved individuals 
could potentially be included or excluded from the population. To complete the TD 
specification we req uire a number of additional components. Using the language of 



Gelman et al 



(|2004l ) pg 338-339, we require 

1. Auxiliary random variables that make the parameter space equivalent for each of 
these 2 M ~ n models. In particular, we need to ensure that we have M parameters 
(equal to the dimension of p if all M individuals are included in the population). 
For each of the 2 M ~ n models we need a distribution that describes how to generate 
the auxiliary random variables when moving to another model. 

2. A set of bijections that describe how the set of parameters and auxiliary variables 
in one model relates to the set of parameters and auxiliary variables in another 
model. 

3. Prior model probabilities. 

We specify these components as follows: 

1. Set the distribution for the auxiliary variables for each pair of models to be the 
prior distribution f(p\O p ). This simplifies the algorithm since this distribution is 
the same for each of the auxiliary random variables in every model. 

2. Specify identity bijections between each pair of models. 
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3. Set the prior model probability as proportional to tp (1 — i)j) m ~ n for a model 
with N individuals in the population and include a hierarchical Be(l, 1) prior 
distribution for ijj. 

If we implement this scheme using a TD Gibbs sampler (e.g. iLink and Barkerll2010l . 



pg. 146), where in each iteration we choose between pairs of these models that cor- 
respond to Wi = or Wi = 1, then the resulting TD a lgorithm has the s ame full 



Rovle et al 



(|2007|). In this 



conditional distributions as the DA algorithm described in 
sense, the TD algorithm described above is exactly the same as the equivalent DA 
algorithm and any distinctions can be thought of as semantic. 

Any difference that does exist between the two approaches lies in the interpretation 
that can be placed on supplemental variables. In the fixed-dimension DA approach, 
the M — N supplemental capture probabilities correspond to phantom individuals that 
exist in a sense, but only in the 'otherworld' that is described by the superpopulation. 
Of necessity, these capture probabilities have the same prior as any other capture 
probability in the model. In the TD approach described above, as well as the standard 
TD approach to M^, the supplemental variables have only a dimension- matching role. 
Their distribution is chosen solely to ensure an efficient Gibbs sampler and they have 
no interpretation as parameters within the context of the model. 

It is not surprising that the two methods correspond to t he same Gibb s sampler, 



Greenl ()1995l ) is the use 



since a crucial aspect of the RJMCMC algorithm as laid out by 
of auxiliary variables to match the dimension of the various models. In this case, we 
can think of RJMCMC as using DA to match dimensions. So long as we choose these 
a uxiliary variables ap propriates, as we did above, we are able to make the DA scheme 



of 



Rovle et al 



d2007|) correspond to a RJMCMC sheme by only considering RJMCMC 



algorithms that use identity bijections between models. 
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4 Discussion 



(120071 1 is simple 



While the super-population model as implemented by iRovle et al 
and appropriate for most analyses we emphasize that extensions may be difficult or 
problematic to incorporate in this modeling framework. The BUGS code we have 
provided for the standard approach offers an easy solution if other hierarchical priors 
are of interest for N. 

Extension s to include chang es such as relaxing the assumption of error-free tag 
reading (e.g- JWright et al 



2009) are easy to include in the standard approach provided 
an explicit reversible-jump step is included in the Gibbs sampling algorithm. It is 
also important in this context that we do not condition on one ordering of the x (or 
equivale ntly Hi = th e numb er of times individual % is caught) as is done in the BUGS 



code of 



Rovle et al 



(|2007l ) as well as the BUGS code in Figure [TJ Irrespective of 
the model we choose to use, care needs to be taken to ensure the combinatorial term 
is correct for the given model. For the full likelihood approach, this means that we 
include the full combinatorial term, as shown in ([3|), in the model when we sample 
from the full conditiona l distributions for all unknowns, including the uncertain tags 
(see 



Wright et al. 



(2009) for an example). It is less clear how we would proceed in the 
superpopulation approach in the presence of tag error. 

Being able to specify the model without having to explicitly include the combina- 
torial terms is one of the advantages of the superpopulation approach. However, as we 
have shown, while correct for the standard model, conditioning on W\ = 1, . . . , W n = 1 
results in an marginal likelihood that has the incorrect combinatorial term if there is 
error in tag reading. In order to obtain valid inference in this setting, additional ad- 
justments to the superpopulation model will be required to ensure that the marginal 
likelihood is correct. 

We emphasise that although we have included an upper limit M on N in our 
BUGS code, this is not required for the general TD algorithm. Also, during general 



implementation we only need sample values for auxiliary variables that are involved in 
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the change of dimension from N to N* rather than the M — n sampled by the BUGS 
code. This can result in substantial gains in efficiency when M is much larger than a 
typical value for N , for example, when there is skew in the posterior for N. 



We believe that t 

by 



Rovle et al 



re main advantage of the super-population approach as advocated 
(|2007l ) is the fast and easy implementation of the model in BUGS. How- 
ever, an unappealing aspect of this approach is that N is not explicit in the likelihood 
and its use appears to be restricted to implied priors for N that are based on binomial 
mixture models. BUGS code for fast and easy implementation of the standard model 
can also be written as we have shown. 

Although we have focused here on the mark-recapture model Mh, the insights we 
offer are not restricted to but extend to other closed and open population capture- 
recapture models, occupancy modeling, distance sampling, and any other problem that 
involves inference about an unobserved number of draws from a series of categorical 
distributions each with a distinct probability vector. 
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Definition 



k the number of occasions on which the population is sampled. 

N population size. 

M superpopulation size (M > TV). 

P an iV x 1 random vector of capture probabilities with a realisation denoted by 
P 

V the sample space for P. 

X a random capture history matrix of dimension N x k with with Xij the indicator 
for capture of individual i (i = 1, . . . , N) in sample j (j = 1, . . . , k. We use 
x to denote a realisation of X and x obs a value observed after sampling; x obs 
includes N — n rows of zeros which are treated as observed conditional on N . 

S the sample space for X. 

W a random vector with value 1 if the ith member of the superpopulation is one 
of the N individuals that are in the study population. We denote a realisation 
of W by w. 

W the sample space for W. 

Zh the number of individuals with capture history h G H, where H = 
{11 • • • 11, 11 • • • 10, . . . , 00 • • • 00}. We also refer to the null history 00 ... 00 
as 0. 

Table 1: Notation 
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1 . model { 

2. mu ~ dlogis(0,l) 

3. t ~ dt (0,1,2) 

4. tau<-abs(t*5) 

5. for(i in 1:M){ 

6. logit(p[i]) <- lp[i] 

7. lp[i] ~ dnorm(mu,tau) 

8. y[i] ~ dbin(pi [i] ,k) 

9. pi[i] <- p[i]*w[i] 

10. w[i] <- step(N-i) 

11. } 

12. N dcat(pee[l:M]) 

13. n ~ dbin(0. 00001, N) 

14. } 



Explanation of the BUGS code: 

Line 2—4 &z 6—7 Distribution f(p\9 p ), including prior distributions on 9 P = (/i,r). 

Line 8—10 Modeling the observed capture histories - y [i] is the number of times individual 
i was caught during k samples. We could model the capture indicators structure x [i , j] 
if we prefer. 

Line 12 Specifying f(N). Note that the starting value for N> n. Here we chose to use a 
discrete uniform prior on iV so the values pee [i] are specified as data and are all set to 
jj. Note that (i) there are many different ways we can parameterize this distribution 
and that this may affect the mixing speed as well as the time taken to run the code. 



Line 13 Including the term oc ( N _ n y using the approximation of iDurban and Elston 



(120051 ) who treat n as binomially distributed with index N and success probability 
e ~ 0. We assume that tags are correctly read and so do not include An- 
other approach to including the normalizing constant is to include a variable zero=0 
and then substitute line 13 with two lines: (i) zeros~dpois (lam) and (ii) lam <- 
- (logf act (N) -logf act (N-n) -logf act (M) ) . 

Note that this model statement is appropriate for both BUGS and JAGS. We provide alter- 
native model statements and the data and initial values required for both BUGS and JAGS 
for each of these modeling statements at www.maramatanga.com. 

Figure 1: BUGS Code For Model Mh Using RJMCMC 
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Figure 2: Traceplot and Posterior Density for N for the snowshoe hare data fitted in JAGS 
using the code in figure HJ 
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